Contextual retrieval

2025-09-13 (modified: 2025-09-17)

RAG의 “retrieval” 단계를 개선하는 방법으로 Anthropic에서 2024년 9월에 제안한 방법.

The method is called “Contextual Retrieval” and uses two sub-techniques: Contextual Embeddings and Contextual BM25. This method can reduce the number of failed retrievals by 49% and, when combined with reranking, by 67%.

전통적 RAG의 문제 및 개선책:

사용자가 “Error code TS-999” 같은 질의를 입력할 경우 임베딩만 써서 RAG를 할 경우 “TS-999” 같은 중요한 정보를 retrieval 단계에서 놓칠 수 있음. BM25와 임베딩을 함께 써서 결과를 합치면 이런 문제를 피할 수 있음
“The company’s revenue grew by 3% over the previous quarter”라는 청크가 있을 때 “the company”가 어디이며 “the previous quarter”가 언제인지 알 수가 없음. 청킹을 한 뒤에 Haiku 등 싸고 빠른 LLM을 써서 맥락을 덧붙여주면(예: “This chunk is from an SEC filing on ACME corp’s performance in Q2 2023; the previous quarter’s revenue was $314 million. The company’s revenue grew by 3% over the previous quarter.”) 이 문제를 개선할 수 있음
Cohere reranker 등을 써서 가져온 청크들의 순위를 조정하면 성능이 더 좋아짐

성능 개선:

Contextual Embeddings reduced the top-20-chunk retrieval failure rate by 35% (5.7% → 3.7%).
Combining Contextual Embeddings and Contextual BM25 reduced the top-20-chunk retrieval failure rate by 49% (5.7% → 2.9%).
Reranked Contextual Embedding and Contextual BM25 reduced the top-20-chunk retrieval failure rate by 67% (5.7% → 1.9%).